RNA-CODE: A Noncoding RNA Classification Tool for Short Reads in NGS Data Lacking Reference Genomes
نویسندگان
چکیده
The number of transcriptomic sequencing projects of various non-model organisms is still accumulating rapidly. As non-coding RNAs (ncRNAs) are highly abundant in living organism and play important roles in many biological processes, identifying fragmentary members of ncRNAs in small RNA-seq data is an important step in post-NGS analysis. However, the state-of-the-art ncRNA search tools are not optimized for next-generation sequencing (NGS) data, especially for very short reads. In this work, we propose and implement a comprehensive ncRNA classification tool (RNA-CODE) for very short reads. RNA-CODE is specifically designed for ncRNA identification in NGS data that lack quality reference genomes. Given a set of short reads, our tool classifies the reads into different types of ncRNA families. The classification results can be used to quantify the expression levels of different types of ncRNAs in RNA-seq data and ncRNA composition profiles in metagenomic data, respectively. The experimental results of applying RNA-CODE to RNA-seq of Arabidopsis and a metagenomic data set sampled from human guts demonstrate that RNA-CODE competes favorably in both sensitivity and specificity with other tools. The source codes of RNA-CODE can be downloaded at http://www.cse.msu.edu/~chengy/RNA_CODE.
منابع مشابه
A Sensitive and Accurate protein domain cLassification Tool (SALT) for short reads
MOTIVATION Protein domain classification is an important step in functional annotation for next-generation sequencing data. For RNA-Seq data of non-model organisms that lack quality or complete reference genomes, existing protein domain analysis pipelines are applied to short reads directly or to contigs that are generated using de novo sequence assembly tools. However, these strategies do not ...
متن کاملPervasive Transcription of Mitochondrial, Plastid, and Nucleomorph Genomes across Diverse Plastid-Bearing Species
Organelle genomes exhibit remarkable diversity in content, structure, and size, and in their modes of gene expression, which are governed by both organelle- and nuclear-encoded machinery. Next generation sequencing (NGS) has generated unprecedented amounts of genomic and transcriptomic data, which can be used to investigate organelle genome transcription. However, most of the available eukaryot...
متن کاملNormalization of human RNA-seq experiments using chimpanzee RNA as a spike-in standard
Normalization of human RNA-seq experiments employing chimpanzee RNA as a spike-in standard is reported. Human and chimpanzee RNAs exhibit single nucleotide variations (SNVs) in average 210-bp intervals. Spike-in chimpanzee RNA would behave the same as the human counterparts during the whole NGS procedures owing to the high sequence similarity. After discrimination of species origins of the NGS ...
متن کاملStatistics for Next Generation Sequencing – Meeting Report
AnAlysis of RnA-seq dAtA Profiling the transcriptome has been a central application of NGS technologies. Since the sequencing technology generates short reads, the first step is to map the reads onto the source genome, genes, and transcripts. Despite development of many algorithms and tools for mapping reads to the reference genomes, accurately mapping RNA-seq reads remains a tough problem due ...
متن کاملAccurate Estimation of Expression Levels of Homologous Genes in RNA-seq Experiments
Abstract Next generation high-throughput sequencing (NGS) is poised to replace array-based technologies as the experiment of choice for measuring RNA expression levels. Several groups have demonstrated the power of this new approach (RNA-seq), making significant and novel contributions and simultaneously proposing methodologies for the analysis of RNA-seq data. In a typical experiment, millions...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره 8 شماره
صفحات -
تاریخ انتشار 2013